Session 8: Application Programming Interface (APIs)

Introduction to Web Scraping and Data Management for Social Scientists

Johannes B. Gruber

2024-07-31

Introduction

This Course

tinytable_q2zadfaq8qbxizzs8lc4
Day Session
1 Introduction
2 Data Structures and Wrangling
3 Working with Files
4 Linking and joining data & SQL
5 Scaling, Reporting and Database Software
6 Introduction to the Web
7 Static Web Pages
8 Application Programming Interface (APIs)
9 Interactive Web Pages
10 Building a Reproducible Research Project

The Plan for Today

In this session, we learn how to adopt data from someone else. We will:

  • Learn what an API is and what parts it consists of
  • Learn about httr2, a modern intuitive package to communicate with APIs
  • Discuss some examples:
    • A simple first API: The Guardian API
    • UK Parliament API
    • Semantic Scholar API
  • Go into a bit more detail on requesting raw data

Original Image Source: prowebscraper.com

What are APIs?

What is An Application Programming Interface (API)?

  • An Application Programming Interface (API) is a way for two computer programs to speak to each other
  • In modern software development they are used extensively when:
    • two programs are not on the same machine
    • two applications are not in the same language
    • when the inner workings of a software should be obscured, but its functionality is offered for customization
    • when a graphic user interface would be inconvenient at scale
  • Several important types (SOAP, GraphQL, etc.), but we will focus on REST (Representational state transfer) APIs
  • Commonly used to distribute data or do many other things
  • A few prominent examples:
    • the Twitter and Facebook APIs (both effectively defunct)
    • the ChatGPT API, which is used to build many additional services
    • news APIs like The Guardian and NYT
    • financial APIs
    • translation APIs (Google, Bing and DeepL)

Parts of an API call

API calls usually combine several elements:

  • a base URL of the service (e.g., https://api.openai.com/)
  • an endpoint for a specific service, usually accessed through a sub-directory (e.g., /v1/completions)
  • an API method: GET, POST, PUT, DELETE, etc. (only GET and sometimes POST are important for us )
  • headers containing some settings, e.g., what format you want to receive the data in (JSON, XML, HTML etc.), and communicating who you are through user-agent, cookies, device and software information that is usually used for debugging
  • query parameters, i.e., your search term, filters, what fields/columns you want to access, how many results you want to receive, how results are ordered etc (?q=parliament%20AND%20debate)
  • a body if your request contains some more complicated instructions (not for GET requests)
  • authentication, usually in form of a token (a standardized string, similar to a password)

Parts of an API response

APIs respond to a call. The response usually also contains several elements:

  • a status code: 200s mean success, 300s mean success with some caveat, 400+ are request errors (not found, forbidden), 500 is a server error
  • headers provide additional information about the response (e.g., type of data returned, size of the data, timestamp)
  • body: the main response containing the requested data
  • response metadata: more information about the response (e.g., pagination information, version numbers, remaining rate limit allowance, link to next page)
  • error messages: when unsuccessful, the API might include an error message on top of the status code

Accessing APIs from R

The httr2 package

  • rewrite of the httr which was the de-factor default to develop API packages in R
  • developed by Hadley Wickham
  • tidyverse programming principles
    • telling verbs are used in a pipe
    • requests are build up using req_* functions
    • responses are deconstructed using resp_*
    • makes wrapping an API in a few functions or a package straightforward

Example: The Guardian API

Background

  • The newspaper The Guardian offers all its articles through an open API for free 🤓
  • To access the API, you first need to obtain an API key by filling out a small form here
  • The API key should arrive within seconds per mail
  • This is unfortunately very rare in the world of news media ☹️
  • To figure out how to use the API, we can use its documentation

Your task: get a key and use usethis::edit_r_environ(scope = "project") to open your .Renviron file. Save the API key as the variable GUARDIAN_KEY.

Building Requests

Let’s build our first httr2 request!

library(httr2)
library(tidyverse, warn.conflicts = FALSE)
req <- request("https://content.guardianapis.com") |>  # start the request with the base URL
  req_url_path("search") |>                            # navigate to the endpoint you want to access
  req_method("GET") |>                                 # specify the method
  req_timeout(seconds = 60) |>                         # how long to wait for a response
  req_headers("User-Agent" = "httr2 guardian test") |> # specify request headers
  # req_body_json() |>                                 # since this is a GET request the body stays empty
  req_url_query(                                       # instead the query is added to the URL
    q = "parliament AND debate",
    "show-blocks" = "all"
  ) |>
  req_url_query(                                       # in this case, the API key is also added to the query
    "api-key" = Sys.getenv("GUARDIAN_KEY")             # but httr2 also has req_auth_* functions for other
  )                                                    # authentication procedures
print(req)

We now built the request. But this doesn’t yet do anything until you also perform it.

Performing the request

resp <- req_perform(req)
resp

Printing the request tells us several important things:

  • the status of the response is OK (hurray!)
  • the response carries data in the JSON format
  • however, you probably don’t want to manually inspect each response…

Parsing the response: a first look

We can automatically check if the response has the form we expect:

resp_status(resp) < 400
[1] TRUE
resp_content_type(resp) == "application/json"
[1] TRUE

If we’re happy with the status of the response, we can start to look at the body by transforming it with the correct resp_body_* function:

returned_body <- resp_body_json(resp)
lobstr::tree(returned_body, max_length = 25)
<list>
└─response: <list>
  ├─status: "ok"
  ├─userTier: "developer"
  ├─total: 30810
  ├─startIndex: 1
  ├─pageSize: 10
  ├─currentPage: 1
  ├─pages: 3081
  ├─orderBy: "relevance"
  └─results: <list>
    ├─<list>
    │ ├─id: "australia-news/2023/nov/15/peter..."
    │ ├─type: "article"
    │ ├─sectionId: "australia-news"
    │ ├─sectionName: "Australia news"
    │ ├─webPublicationDate: "2023-11-15T07:19:09Z"
    │ ├─webTitle: "Peter Dutton accused of ‘weaponi..."
    │ ├─webUrl: "https://www.theguardian.com/aust..."
    │ ├─apiUrl: "https://content.guardianapis.com..."
    │ ├─blocks: <list>
    │ │ ├─main: <list>
    │ │ │ ├─id: "65546f008f0894c5322391dd"
    │ │ │ ├─bodyHtml: "<figure class="element element-a..."
    │ │ │ ├─bodyTextSummary: ""
... 

Parsing the response: a first look

We already see some useful information about the the result. We could extract that information either with pluck from the tidyverse or using square brackets:

pluck(returned_body, "response", "total")
[1] 30810
pluck(returned_body, "response", "pageSize")
[1] 10
pluck(returned_body, "response", "pages")
[1] 3081
returned_body[["response"]][["total"]]
[1] 30810
returned_body[["response"]][["pageSize"]]
[1] 10
returned_body[["response"]][["pages"]]
[1] 3081

Parsing the response: extracting the data

So far we only got the results for page 1, which is a common way to return results from an API. To get to the other pages that contain results, we would need to loop through all of these pages (by adding the query page = i). For now, we can have a closer look at the articles on the first results page.

search_res <- pluck(returned_body, "response", "results")

We can have a closer look at this using the Viewer in RStudio:

View(search_res)

In typical fashion, this API returns the data in a rather complicated format. This is probably the main reason why people dislike working with APIs in R, as it can be very frustrating to get this into a format that makes sense for us.

Parsing the response: building a data wrangling function

Let’s build a function to select just some important information. We start by writing a few lines of code to parse the first article:

res <- pluck(search_res, 1)
res
$id
[1] "australia-news/2023/nov/15/peter-dutton-accused-weaponising-antisemitism-debate-parliament-immigration-detention-israel-hamas-war-gaza"

$type
[1] "article"

$sectionId
[1] "australia-news"

$sectionName
[1] "Australia news"

$webPublicationDate
[1] "2023-11-15T07:19:09Z"

$webTitle
[1] "Peter Dutton accused of ‘weaponising antisemitism’ during fiery debate in parliament"

$webUrl
[1] "https://www.theguardian.com/australia-news/2023/nov/15/peter-dutton-accused-weaponising-antisemitism-debate-parliament-immigration-detention-israel-hamas-war-gaza"

$apiUrl
[1] "https://content.guardianapis.com/australia-news/2023/nov/15/peter-dutton-accused-weaponising-antisemitism-debate-parliament-immigration-detention-israel-hamas-war-gaza"

$blocks
$blocks$main
$blocks$main$id
[1] "65546f008f0894c5322391dd"

$blocks$main$bodyHtml
[1] "<figure class=\"element element-atom\"> <gu-atom data-atom-id=\"af2b6f80-463e-4d78-a85c-3927f59481db\"         data-atom-type=\"media\"    > </gu-atom> </figure>"

$blocks$main$bodyTextSummary
[1] ""

$blocks$main$attributes
named list()

$blocks$main$published
[1] TRUE

$blocks$main$createdDate
[1] "2023-11-15T07:19:09Z"

$blocks$main$lastModifiedDate
[1] "2023-11-15T07:10:56Z"

$blocks$main$contributors
list()

$blocks$main$elements
$blocks$main$elements[[1]]
$blocks$main$elements[[1]]$type
[1] "contentatom"

$blocks$main$elements[[1]]$assets
list()

$blocks$main$elements[[1]]$contentAtomTypeData
$blocks$main$elements[[1]]$contentAtomTypeData$atomId
[1] "af2b6f80-463e-4d78-a85c-3927f59481db"

$blocks$main$elements[[1]]$contentAtomTypeData$atomType
[1] "media"





$blocks$body
$blocks$body[[1]]
$blocks$body[[1]]$id
[1] "655447be8f0894c5322390e5"

$blocks$body[[1]]$bodyHtml
[1] "<p>Anthony Albanese has accused Peter Dutton of “weaponising antisemitism” during a heated parliamentary debate, after the opposition leader attempted to link criticisms of the government’s response to the Gaza conflict and the release of detainees from immigration detention.</p> <p>The prime minister was visibly angry during a fiery question time and responded to a Coalition motion, saying both Jewish and Muslim communities were scared and being threatened as the Israel-Hamas conflict continued. As the government faces criticism from its right and left factions over its responses to the Gaza war and the <a href=\"https://www.theguardian.com/australia-news/2023/nov/08/australia-high-court-indefinite-detention-ruling-government\">high court’s decision that indefinite immigration detention is unlawful</a>, Albanese called for political leaders to strive for unity.</p> <ul> <li><p><strong><a href=\"https://www.theguardian.com/email-newsletters?CMP=copyembed\">Sign up for Guardian Australia’s free morning and afternoon email newsletters for your daily news roundup</a></strong></p></li> </ul> <p>“To come in here and move this resolution, and link antisemitism with the decision of the high court, is beyond contempt,” he said in response to Dutton’s motion.</p> <p>“I didn’t think that he could go this low as to link these two issues.”</p> <p>Despite including both in the motion, Dutton dismissed Albanese’s criticisms by saying “there is no link” between antisemitism and the high court case, accusing the prime minister of “concocted outrage”.</p> <p>Dutton had earlier demanded Albanese cancel his trip to the Apec meeting in San Francisco, to remain in Australia and oversee federal responses to the issues. Dutton told a press conference “the first charge of the prime minister is to keep the Australian people safe”, asking for a national cabinet meeting of state leaders to discuss community safety.</p> <p>Home affairs minister, Clare O’Neil, and immigration minister Andrew Giles said in a statement on Wednesday night that the high court case raised “complex issues” around community safety, and that the government would “introduce and seek to pass this legislation tomorrow to further respond to the High Court’s decision”.</p> <p>“The full implications will not be clear until the high court has provided written reasons for their judgement, which means further legislation may be required once that judgement has been considered.”</p> <p>The opposition leader claimed Labor had a “a divided caucus in relation to the Israel issue”. Guardian Australia has reported Labor MPs and party members, including <a href=\"https://www.theguardian.com/australia-news/2023/nov/14/anthony-albanese-labor-branch-alp-marrickville-gaza-ceasefire-israel\">in Albanese’s own local branch</a>, are agitating for the government to <a href=\"https://www.theguardian.com/world/2023/nov/11/why-labor-is-on-a-tightrope-over-its-response-to-the-israel-hamas-war\">publicly call for a Gaza ceasefire</a>.</p> <p>Dutton brought the criticism to question time on Wednesday, moving a motion calling on the House of Representatives to express its concern at rising antisemitism and the release of 80 people from detention.</p> <p>Dutton said the 7 October attack by Hamas had sparked a new wave of antisemitism.</p> <p>“This prime minister had a solemn duty to stand up and make sure that his government spoke with one voice,” Dutton claimed, accusing the government of “speaking out of both sides of its mouth”.</p> <p>“The caucus is split right down the middle. The Australian public sees this as a government where the wheels are quickly falling off. It’s given rise to social disharmony in this country.”</p> <p>Dutton went on to claim that “what compounds it” was what he called a “decision” from the immigration minister, Andrew Giles, to release into the community more than 80 people previously in detention after the high court decision.</p> <p>In response to opposition questions, Giles said “complying with the orders of the high court is not something that is optional”.</p> <p>In a furious response to Dutton, Albanese accused the opposition of “overreach”.</p> <p>“The weaponisation, or attempt to weaponise antisemitism, in this chamber and make it a partisan issue is frankly beyond contempt,” he said.</p> <p>“I make no apologies for standing up against antisemitism and I will do it unequivocally … But I also have a track record of standing up for the rights and for justice of Palestinian people.”</p> <aside class=\"element element-rich-link element--thumbnail\"> <p> <span>Related: </span><a href=\"https://www.theguardian.com/australia-news/2023/nov/08/australia-high-court-indefinite-detention-ruling-government\">Indefinite immigration detention ruled unlawful in landmark Australian high court decision </a> </p> </aside>  <p>The prime minister said the government needed to support both Jewish and Muslim communities and stand up for the rights of both Israelis and Palestinians.</p> <p>“Jewish Australians are fearful at the moment,” Albanese said. “The sort of activity that is occurring is scaring them and I stand with them. No one should threaten people because of their religion or their race in this country.</p> <p>“But it is also the case that Arab Australians and Islamic Australians and women wearing hijabs in the streets of Sydney and Melbourne are being threatened and I stand against that as well.”</p> <p>Albanese criticised the Coalition for its attempts under the Turnbull government to repeal or water down section 18C of the Racial Discrimination Act, despite concern from the Jewish community.</p> <p>Earlier on Wednesday Dutton brushed off a question about whether he regretted that move, given his interest in social cohesion. “I’ll let you live in the past,” Dutton responsed.</p> <p>Independent MPs condemned the opposition’s question time tactics. The member for Goldstein, Zoe Daniel, claimed it was “promoting social division not social cohesion”; the member for Wentworth, Allegra Spender, claimed the “attempt to weaponise and politicise antisemitism for political gain is unconscionable”.</p> <p>After question time, the shadow immigration minister, Dan Tehan, sought to portray the government’s decision to bring urgent legislation as at odds with earlier statements it was not possible to completely counteract the high court’s decision.</p> <p>The government bill will advance policies that do not re-detain people, likely to include greater use of conditions on visas, compliance and monitoring.</p> <p>Tehan told reporters the Coalition would use its briefing to ask government lawyers “all the legal avenues we have to … lock these hardened criminals up again”. Not all of the people released have been convicted of crimes.</p> <p>Tehan said that he was confident a legal solution existed, although he was unable to identify what it might be.</p> <p>“We want to see these people locked up again,” he said.</p>"

$blocks$body[[1]]$bodyTextSummary
[1] "Anthony Albanese has accused Peter Dutton of “weaponising antisemitism” during a heated parliamentary debate, after the opposition leader attempted to link criticisms of the government’s response to the Gaza conflict and the release of detainees from immigration detention. The prime minister was visibly angry during a fiery question time and responded to a Coalition motion, saying both Jewish and Muslim communities were scared and being threatened as the Israel-Hamas conflict continued. As the government faces criticism from its right and left factions over its responses to the Gaza war and the high court’s decision that indefinite immigration detention is unlawful, Albanese called for political leaders to strive for unity. Sign up for Guardian Australia’s free morning and afternoon email newsletters for your daily news roundup “To come in here and move this resolution, and link antisemitism with the decision of the high court, is beyond contempt,” he said in response to Dutton’s motion. “I didn’t think that he could go this low as to link these two issues.” Despite including both in the motion, Dutton dismissed Albanese’s criticisms by saying “there is no link” between antisemitism and the high court case, accusing the prime minister of “concocted outrage”. Dutton had earlier demanded Albanese cancel his trip to the Apec meeting in San Francisco, to remain in Australia and oversee federal responses to the issues. Dutton told a press conference “the first charge of the prime minister is to keep the Australian people safe”, asking for a national cabinet meeting of state leaders to discuss community safety. Home affairs minister, Clare O’Neil, and immigration minister Andrew Giles said in a statement on Wednesday night that the high court case raised “complex issues” around community safety, and that the government would “introduce and seek to pass this legislation tomorrow to further respond to the High Court’s decision”. “The full implications will not be clear until the high court has provided written reasons for their judgement, which means further legislation may be required once that judgement has been considered.” The opposition leader claimed Labor had a “a divided caucus in relation to the Israel issue”. Guardian Australia has reported Labor MPs and party members, including in Albanese’s own local branch, are agitating for the government to publicly call for a Gaza ceasefire. Dutton brought the criticism to question time on Wednesday, moving a motion calling on the House of Representatives to express its concern at rising antisemitism and the release of 80 people from detention. Dutton said the 7 October attack by Hamas had sparked a new wave of antisemitism. “This prime minister had a solemn duty to stand up and make sure that his government spoke with one voice,” Dutton claimed, accusing the government of “speaking out of both sides of its mouth”. “The caucus is split right down the middle. The Australian public sees this as a government where the wheels are quickly falling off. It’s given rise to social disharmony in this country.” Dutton went on to claim that “what compounds it” was what he called a “decision” from the immigration minister, Andrew Giles, to release into the community more than 80 people previously in detention after the high court decision. In response to opposition questions, Giles said “complying with the orders of the high court is not something that is optional”. In a furious response to Dutton, Albanese accused the opposition of “overreach”. “The weaponisation, or attempt to weaponise antisemitism, in this chamber and make it a partisan issue is frankly beyond contempt,” he said. “I make no apologies for standing up against antisemitism and I will do it unequivocally … But I also have a track record of standing up for the rights and for justice of Palestinian people.”\nThe prime minister said the government needed to support both Jewish and Muslim communities and stand up for the rights of both Israelis and Palestinians. “Jewish Australians are fearful at the moment,” Albanese said. “The sort of activity that is occurring is scaring them and I stand with them. No one should threaten people because of their religion or their race in this country. “But it is also the case that Arab Australians and Islamic Australians and women wearing hijabs in the streets of Sydney and Melbourne are being threatened and I stand against that as well.” Albanese criticised the Coalition for its attempts under the Turnbull government to repeal or water down section 18C of the Racial Discrimination Act, despite concern from the Jewish community. Earlier on Wednesday Dutton brushed off a question about whether he regretted that move, given his interest in social cohesion. “I’ll let you live in the past,” Dutton responsed. Independent MPs condemned the opposition’s question time tactics. The member for Goldstein, Zoe Daniel, claimed it was “promoting social division not social cohesion”; the member for Wentworth, Allegra Spender, claimed the “attempt to weaponise and politicise antisemitism for political gain is unconscionable”. After question time, the shadow immigration minister, Dan Tehan, sought to portray the government’s decision to bring urgent legislation as at odds with earlier statements it was not possible to completely counteract the high court’s decision. The government bill will advance policies that do not re-detain people, likely to include greater use of conditions on visas, compliance and monitoring. Tehan told reporters the Coalition would use its briefing to ask government lawyers “all the legal avenues we have to … lock these hardened criminals up again”. Not all of the people released have been convicted of crimes. Tehan said that he was confident a legal solution existed, although he was unable to identify what it might be. “We want to see these people locked up again,” he said."

$blocks$body[[1]]$attributes
named list()

$blocks$body[[1]]$published
[1] TRUE

$blocks$body[[1]]$createdDate
[1] "2023-11-15T07:19:09Z"

$blocks$body[[1]]$firstPublishedDate
[1] "2023-11-15T07:26:24Z"

$blocks$body[[1]]$publishedDate
[1] "2023-11-15T10:37:13Z"

$blocks$body[[1]]$lastModifiedDate
[1] "2023-11-15T10:37:13Z"

$blocks$body[[1]]$contributors
list()

$blocks$body[[1]]$elements
$blocks$body[[1]]$elements[[1]]
$blocks$body[[1]]$elements[[1]]$type
[1] "text"

$blocks$body[[1]]$elements[[1]]$assets
list()

$blocks$body[[1]]$elements[[1]]$textTypeData
$blocks$body[[1]]$elements[[1]]$textTypeData$html
[1] "<p>Anthony Albanese has accused Peter Dutton of “weaponising antisemitism” during a heated parliamentary debate, after the opposition leader attempted to link criticisms of the government’s response to the Gaza conflict and the release of detainees from immigration detention.</p> \n<p>The prime minister was visibly angry during a fiery question time and responded to a Coalition motion, saying both Jewish and Muslim communities were scared and being threatened as the Israel-Hamas conflict continued. As the government faces criticism from its right and left factions over its responses to the Gaza war and the <a href=\"https://www.theguardian.com/australia-news/2023/nov/08/australia-high-court-indefinite-detention-ruling-government\">high court’s decision that indefinite immigration detention is unlawful</a>, Albanese called for political leaders to strive for unity.</p> \n<ul> \n <li><p><strong><a href=\"https://www.theguardian.com/email-newsletters?CMP=copyembed\">Sign up for Guardian Australia’s free morning and afternoon email newsletters for your daily news roundup</a></strong></p></li> \n</ul> \n<p>“To come in here and move this resolution, and link antisemitism with the decision of the high court, is beyond contempt,” he said in response to Dutton’s motion.</p> \n<p>“I didn’t think that he could go this low as to link these two issues.”</p> \n<p>Despite including both in the motion, Dutton dismissed Albanese’s criticisms by saying “there is no link” between antisemitism and the high court case, accusing the prime minister of “concocted outrage”.</p> \n<p>Dutton had earlier demanded Albanese cancel his trip to the Apec meeting in San Francisco, to remain in Australia and oversee federal responses to the issues. Dutton told a press conference “the first charge of the prime minister is to keep the Australian people safe”, asking for a national cabinet meeting of state leaders to discuss community safety.</p> \n<p>Home affairs minister, Clare O’Neil, and immigration minister Andrew Giles said in a statement on Wednesday night that the high court case raised “complex issues” around community safety, and that the government would “introduce and seek to pass this legislation tomorrow to further respond to the High Court’s decision”.</p> \n<p>“The full implications will not be clear until the high court has provided written reasons for their judgement, which means further legislation may be required once that judgement has been considered.”</p> \n<p>The opposition leader claimed Labor had a “a divided caucus in relation to the Israel issue”. Guardian Australia has reported Labor MPs and party members, including <a href=\"https://www.theguardian.com/australia-news/2023/nov/14/anthony-albanese-labor-branch-alp-marrickville-gaza-ceasefire-israel\">in Albanese’s own local branch</a>, are agitating for the government to <a href=\"https://www.theguardian.com/world/2023/nov/11/why-labor-is-on-a-tightrope-over-its-response-to-the-israel-hamas-war\">publicly call for a Gaza ceasefire</a>.</p> \n<p>Dutton brought the criticism to question time on Wednesday, moving a motion calling on the House of Representatives to express its concern at rising antisemitism and the release of 80 people from detention.</p> \n<p>Dutton said the 7 October attack by Hamas had sparked a new wave of antisemitism.</p> \n<p>“This prime minister had a solemn duty to stand up and make sure that his government spoke with one voice,” Dutton claimed, accusing the government of “speaking out of both sides of its mouth”.</p> \n<p>“The caucus is split right down the middle. The Australian public sees this as a government where the wheels are quickly falling off. It’s given rise to social disharmony in this country.”</p> \n<p>Dutton went on to claim that “what compounds it” was what he called a “decision” from the immigration minister, Andrew Giles, to release into the community more than 80 people previously in detention after the high court decision.</p> \n<p>In response to opposition questions, Giles said “complying with the orders of the high court is not something that is optional”.</p> \n<p>In a furious response to Dutton, Albanese accused the opposition of “overreach”.</p> \n<p>“The weaponisation, or attempt to weaponise antisemitism, in this chamber and make it a partisan issue is frankly beyond contempt,” he said.</p> \n<p>“I make no apologies for standing up against antisemitism and I will do it unequivocally … But I also have a track record of standing up for the rights and for justice of Palestinian people.”</p>"



$blocks$body[[1]]$elements[[2]]
$blocks$body[[1]]$elements[[2]]$type
[1] "rich-link"

$blocks$body[[1]]$elements[[2]]$assets
list()

$blocks$body[[1]]$elements[[2]]$richLinkTypeData
$blocks$body[[1]]$elements[[2]]$richLinkTypeData$url
[1] "https://www.theguardian.com/australia-news/2023/nov/08/australia-high-court-indefinite-detention-ruling-government"

$blocks$body[[1]]$elements[[2]]$richLinkTypeData$originalUrl
[1] "https://www.theguardian.com/australia-news/2023/nov/08/australia-high-court-indefinite-detention-ruling-government"

$blocks$body[[1]]$elements[[2]]$richLinkTypeData$linkText
[1] "Indefinite immigration detention ruled unlawful in landmark Australian high court decision "

$blocks$body[[1]]$elements[[2]]$richLinkTypeData$linkPrefix
[1] "Related: "

$blocks$body[[1]]$elements[[2]]$richLinkTypeData$role
[1] "thumbnail"



$blocks$body[[1]]$elements[[3]]
$blocks$body[[1]]$elements[[3]]$type
[1] "text"

$blocks$body[[1]]$elements[[3]]$assets
list()

$blocks$body[[1]]$elements[[3]]$textTypeData
$blocks$body[[1]]$elements[[3]]$textTypeData$html
[1] "<p>The prime minister said the government needed to support both Jewish and Muslim communities and stand up for the rights of both Israelis and Palestinians.</p> \n<p>“Jewish Australians are fearful at the moment,” Albanese said. “The sort of activity that is occurring is scaring them and I stand with them. No one should threaten people because of their religion or their race in this country.</p> \n<p>“But it is also the case that Arab Australians and Islamic Australians and women wearing hijabs in the streets of Sydney and Melbourne are being threatened and I stand against that as well.”</p> \n<p>Albanese criticised the Coalition for its attempts under the Turnbull government to repeal or water down section 18C of the Racial Discrimination Act, despite concern from the Jewish community.</p> \n<p>Earlier on Wednesday Dutton brushed off a question about whether he regretted that move, given his interest in social cohesion. “I’ll let you live in the past,” Dutton responsed.</p> \n<p>Independent MPs condemned the opposition’s question time tactics. The member for Goldstein, Zoe Daniel, claimed it was “promoting social division not social cohesion”; the member for Wentworth, Allegra Spender, claimed the “attempt to weaponise and politicise antisemitism for political gain is unconscionable”.</p> \n<p>After question time, the shadow immigration minister, Dan Tehan, sought to portray the government’s decision to bring urgent legislation as at odds with earlier statements it was not possible to completely counteract the high court’s decision.</p> \n<p>The government bill will advance policies that do not re-detain people, likely to include greater use of conditions on visas, compliance and monitoring.</p> \n<p>Tehan told reporters the Coalition would use its briefing to ask government lawyers “all the legal avenues we have to … lock these hardened criminals up again”. Not all of the people released have been convicted of crimes.</p> \n<p>Tehan said that he was confident a legal solution existed, although he was unable to identify what it might be.</p> \n<p>“We want to see these people locked up again,” he said.</p>"






$blocks$totalBodyBlocks
[1] 1


$isHosted
[1] FALSE

$pillarId
[1] "pillar/news"

$pillarName
[1] "News"
id <- res$id
id
[1] "australia-news/2023/nov/15/peter-dutton-accused-weaponising-antisemitism-debate-parliament-immigration-detention-israel-hamas-war-gaza"
type <- res$type
type
[1] "article"
time <- lubridate::ymd_hms(res$webPublicationDate)
time
[1] "2023-11-15 07:19:09 UTC"
headline <- res$webTitle
headline
[1] "Peter Dutton accused of ‘weaponising antisemitism’ during fiery debate in parliament"

Parsing the response: building a data wrangling function

So far so good, but where is the text? It seems it is stored in these “blocks” -> “body” elements. Let’s have a look:

pluck(res, "blocks", "body")
[[1]]
[[1]]$id
[1] "655447be8f0894c5322390e5"

[[1]]$bodyHtml
[1] "<p>Anthony Albanese has accused Peter Dutton of “weaponising antisemitism” during a heated parliamentary debate, after the opposition leader attempted to link criticisms of the government’s response to the Gaza conflict and the release of detainees from immigration detention.</p> <p>The prime minister was visibly angry during a fiery question time and responded to a Coalition motion, saying both Jewish and Muslim communities were scared and being threatened as the Israel-Hamas conflict continued. As the government faces criticism from its right and left factions over its responses to the Gaza war and the <a href=\"https://www.theguardian.com/australia-news/2023/nov/08/australia-high-court-indefinite-detention-ruling-government\">high court’s decision that indefinite immigration detention is unlawful</a>, Albanese called for political leaders to strive for unity.</p> <ul> <li><p><strong><a href=\"https://www.theguardian.com/email-newsletters?CMP=copyembed\">Sign up for Guardian Australia’s free morning and afternoon email newsletters for your daily news roundup</a></strong></p></li> </ul> <p>“To come in here and move this resolution, and link antisemitism with the decision of the high court, is beyond contempt,” he said in response to Dutton’s motion.</p> <p>“I didn’t think that he could go this low as to link these two issues.”</p> <p>Despite including both in the motion, Dutton dismissed Albanese’s criticisms by saying “there is no link” between antisemitism and the high court case, accusing the prime minister of “concocted outrage”.</p> <p>Dutton had earlier demanded Albanese cancel his trip to the Apec meeting in San Francisco, to remain in Australia and oversee federal responses to the issues. Dutton told a press conference “the first charge of the prime minister is to keep the Australian people safe”, asking for a national cabinet meeting of state leaders to discuss community safety.</p> <p>Home affairs minister, Clare O’Neil, and immigration minister Andrew Giles said in a statement on Wednesday night that the high court case raised “complex issues” around community safety, and that the government would “introduce and seek to pass this legislation tomorrow to further respond to the High Court’s decision”.</p> <p>“The full implications will not be clear until the high court has provided written reasons for their judgement, which means further legislation may be required once that judgement has been considered.”</p> <p>The opposition leader claimed Labor had a “a divided caucus in relation to the Israel issue”. Guardian Australia has reported Labor MPs and party members, including <a href=\"https://www.theguardian.com/australia-news/2023/nov/14/anthony-albanese-labor-branch-alp-marrickville-gaza-ceasefire-israel\">in Albanese’s own local branch</a>, are agitating for the government to <a href=\"https://www.theguardian.com/world/2023/nov/11/why-labor-is-on-a-tightrope-over-its-response-to-the-israel-hamas-war\">publicly call for a Gaza ceasefire</a>.</p> <p>Dutton brought the criticism to question time on Wednesday, moving a motion calling on the House of Representatives to express its concern at rising antisemitism and the release of 80 people from detention.</p> <p>Dutton said the 7 October attack by Hamas had sparked a new wave of antisemitism.</p> <p>“This prime minister had a solemn duty to stand up and make sure that his government spoke with one voice,” Dutton claimed, accusing the government of “speaking out of both sides of its mouth”.</p> <p>“The caucus is split right down the middle. The Australian public sees this as a government where the wheels are quickly falling off. It’s given rise to social disharmony in this country.”</p> <p>Dutton went on to claim that “what compounds it” was what he called a “decision” from the immigration minister, Andrew Giles, to release into the community more than 80 people previously in detention after the high court decision.</p> <p>In response to opposition questions, Giles said “complying with the orders of the high court is not something that is optional”.</p> <p>In a furious response to Dutton, Albanese accused the opposition of “overreach”.</p> <p>“The weaponisation, or attempt to weaponise antisemitism, in this chamber and make it a partisan issue is frankly beyond contempt,” he said.</p> <p>“I make no apologies for standing up against antisemitism and I will do it unequivocally … But I also have a track record of standing up for the rights and for justice of Palestinian people.”</p> <aside class=\"element element-rich-link element--thumbnail\"> <p> <span>Related: </span><a href=\"https://www.theguardian.com/australia-news/2023/nov/08/australia-high-court-indefinite-detention-ruling-government\">Indefinite immigration detention ruled unlawful in landmark Australian high court decision </a> </p> </aside>  <p>The prime minister said the government needed to support both Jewish and Muslim communities and stand up for the rights of both Israelis and Palestinians.</p> <p>“Jewish Australians are fearful at the moment,” Albanese said. “The sort of activity that is occurring is scaring them and I stand with them. No one should threaten people because of their religion or their race in this country.</p> <p>“But it is also the case that Arab Australians and Islamic Australians and women wearing hijabs in the streets of Sydney and Melbourne are being threatened and I stand against that as well.”</p> <p>Albanese criticised the Coalition for its attempts under the Turnbull government to repeal or water down section 18C of the Racial Discrimination Act, despite concern from the Jewish community.</p> <p>Earlier on Wednesday Dutton brushed off a question about whether he regretted that move, given his interest in social cohesion. “I’ll let you live in the past,” Dutton responsed.</p> <p>Independent MPs condemned the opposition’s question time tactics. The member for Goldstein, Zoe Daniel, claimed it was “promoting social division not social cohesion”; the member for Wentworth, Allegra Spender, claimed the “attempt to weaponise and politicise antisemitism for political gain is unconscionable”.</p> <p>After question time, the shadow immigration minister, Dan Tehan, sought to portray the government’s decision to bring urgent legislation as at odds with earlier statements it was not possible to completely counteract the high court’s decision.</p> <p>The government bill will advance policies that do not re-detain people, likely to include greater use of conditions on visas, compliance and monitoring.</p> <p>Tehan told reporters the Coalition would use its briefing to ask government lawyers “all the legal avenues we have to … lock these hardened criminals up again”. Not all of the people released have been convicted of crimes.</p> <p>Tehan said that he was confident a legal solution existed, although he was unable to identify what it might be.</p> <p>“We want to see these people locked up again,” he said.</p>"

[[1]]$bodyTextSummary
[1] "Anthony Albanese has accused Peter Dutton of “weaponising antisemitism” during a heated parliamentary debate, after the opposition leader attempted to link criticisms of the government’s response to the Gaza conflict and the release of detainees from immigration detention. The prime minister was visibly angry during a fiery question time and responded to a Coalition motion, saying both Jewish and Muslim communities were scared and being threatened as the Israel-Hamas conflict continued. As the government faces criticism from its right and left factions over its responses to the Gaza war and the high court’s decision that indefinite immigration detention is unlawful, Albanese called for political leaders to strive for unity. Sign up for Guardian Australia’s free morning and afternoon email newsletters for your daily news roundup “To come in here and move this resolution, and link antisemitism with the decision of the high court, is beyond contempt,” he said in response to Dutton’s motion. “I didn’t think that he could go this low as to link these two issues.” Despite including both in the motion, Dutton dismissed Albanese’s criticisms by saying “there is no link” between antisemitism and the high court case, accusing the prime minister of “concocted outrage”. Dutton had earlier demanded Albanese cancel his trip to the Apec meeting in San Francisco, to remain in Australia and oversee federal responses to the issues. Dutton told a press conference “the first charge of the prime minister is to keep the Australian people safe”, asking for a national cabinet meeting of state leaders to discuss community safety. Home affairs minister, Clare O’Neil, and immigration minister Andrew Giles said in a statement on Wednesday night that the high court case raised “complex issues” around community safety, and that the government would “introduce and seek to pass this legislation tomorrow to further respond to the High Court’s decision”. “The full implications will not be clear until the high court has provided written reasons for their judgement, which means further legislation may be required once that judgement has been considered.” The opposition leader claimed Labor had a “a divided caucus in relation to the Israel issue”. Guardian Australia has reported Labor MPs and party members, including in Albanese’s own local branch, are agitating for the government to publicly call for a Gaza ceasefire. Dutton brought the criticism to question time on Wednesday, moving a motion calling on the House of Representatives to express its concern at rising antisemitism and the release of 80 people from detention. Dutton said the 7 October attack by Hamas had sparked a new wave of antisemitism. “This prime minister had a solemn duty to stand up and make sure that his government spoke with one voice,” Dutton claimed, accusing the government of “speaking out of both sides of its mouth”. “The caucus is split right down the middle. The Australian public sees this as a government where the wheels are quickly falling off. It’s given rise to social disharmony in this country.” Dutton went on to claim that “what compounds it” was what he called a “decision” from the immigration minister, Andrew Giles, to release into the community more than 80 people previously in detention after the high court decision. In response to opposition questions, Giles said “complying with the orders of the high court is not something that is optional”. In a furious response to Dutton, Albanese accused the opposition of “overreach”. “The weaponisation, or attempt to weaponise antisemitism, in this chamber and make it a partisan issue is frankly beyond contempt,” he said. “I make no apologies for standing up against antisemitism and I will do it unequivocally … But I also have a track record of standing up for the rights and for justice of Palestinian people.”\nThe prime minister said the government needed to support both Jewish and Muslim communities and stand up for the rights of both Israelis and Palestinians. “Jewish Australians are fearful at the moment,” Albanese said. “The sort of activity that is occurring is scaring them and I stand with them. No one should threaten people because of their religion or their race in this country. “But it is also the case that Arab Australians and Islamic Australians and women wearing hijabs in the streets of Sydney and Melbourne are being threatened and I stand against that as well.” Albanese criticised the Coalition for its attempts under the Turnbull government to repeal or water down section 18C of the Racial Discrimination Act, despite concern from the Jewish community. Earlier on Wednesday Dutton brushed off a question about whether he regretted that move, given his interest in social cohesion. “I’ll let you live in the past,” Dutton responsed. Independent MPs condemned the opposition’s question time tactics. The member for Goldstein, Zoe Daniel, claimed it was “promoting social division not social cohesion”; the member for Wentworth, Allegra Spender, claimed the “attempt to weaponise and politicise antisemitism for political gain is unconscionable”. After question time, the shadow immigration minister, Dan Tehan, sought to portray the government’s decision to bring urgent legislation as at odds with earlier statements it was not possible to completely counteract the high court’s decision. The government bill will advance policies that do not re-detain people, likely to include greater use of conditions on visas, compliance and monitoring. Tehan told reporters the Coalition would use its briefing to ask government lawyers “all the legal avenues we have to … lock these hardened criminals up again”. Not all of the people released have been convicted of crimes. Tehan said that he was confident a legal solution existed, although he was unable to identify what it might be. “We want to see these people locked up again,” he said."

[[1]]$attributes
named list()

[[1]]$published
[1] TRUE

[[1]]$createdDate
[1] "2023-11-15T07:19:09Z"

[[1]]$firstPublishedDate
[1] "2023-11-15T07:26:24Z"

[[1]]$publishedDate
[1] "2023-11-15T10:37:13Z"

[[1]]$lastModifiedDate
[1] "2023-11-15T10:37:13Z"

[[1]]$contributors
list()

[[1]]$elements
[[1]]$elements[[1]]
[[1]]$elements[[1]]$type
[1] "text"

[[1]]$elements[[1]]$assets
list()

[[1]]$elements[[1]]$textTypeData
[[1]]$elements[[1]]$textTypeData$html
[1] "<p>Anthony Albanese has accused Peter Dutton of “weaponising antisemitism” during a heated parliamentary debate, after the opposition leader attempted to link criticisms of the government’s response to the Gaza conflict and the release of detainees from immigration detention.</p> \n<p>The prime minister was visibly angry during a fiery question time and responded to a Coalition motion, saying both Jewish and Muslim communities were scared and being threatened as the Israel-Hamas conflict continued. As the government faces criticism from its right and left factions over its responses to the Gaza war and the <a href=\"https://www.theguardian.com/australia-news/2023/nov/08/australia-high-court-indefinite-detention-ruling-government\">high court’s decision that indefinite immigration detention is unlawful</a>, Albanese called for political leaders to strive for unity.</p> \n<ul> \n <li><p><strong><a href=\"https://www.theguardian.com/email-newsletters?CMP=copyembed\">Sign up for Guardian Australia’s free morning and afternoon email newsletters for your daily news roundup</a></strong></p></li> \n</ul> \n<p>“To come in here and move this resolution, and link antisemitism with the decision of the high court, is beyond contempt,” he said in response to Dutton’s motion.</p> \n<p>“I didn’t think that he could go this low as to link these two issues.”</p> \n<p>Despite including both in the motion, Dutton dismissed Albanese’s criticisms by saying “there is no link” between antisemitism and the high court case, accusing the prime minister of “concocted outrage”.</p> \n<p>Dutton had earlier demanded Albanese cancel his trip to the Apec meeting in San Francisco, to remain in Australia and oversee federal responses to the issues. Dutton told a press conference “the first charge of the prime minister is to keep the Australian people safe”, asking for a national cabinet meeting of state leaders to discuss community safety.</p> \n<p>Home affairs minister, Clare O’Neil, and immigration minister Andrew Giles said in a statement on Wednesday night that the high court case raised “complex issues” around community safety, and that the government would “introduce and seek to pass this legislation tomorrow to further respond to the High Court’s decision”.</p> \n<p>“The full implications will not be clear until the high court has provided written reasons for their judgement, which means further legislation may be required once that judgement has been considered.”</p> \n<p>The opposition leader claimed Labor had a “a divided caucus in relation to the Israel issue”. Guardian Australia has reported Labor MPs and party members, including <a href=\"https://www.theguardian.com/australia-news/2023/nov/14/anthony-albanese-labor-branch-alp-marrickville-gaza-ceasefire-israel\">in Albanese’s own local branch</a>, are agitating for the government to <a href=\"https://www.theguardian.com/world/2023/nov/11/why-labor-is-on-a-tightrope-over-its-response-to-the-israel-hamas-war\">publicly call for a Gaza ceasefire</a>.</p> \n<p>Dutton brought the criticism to question time on Wednesday, moving a motion calling on the House of Representatives to express its concern at rising antisemitism and the release of 80 people from detention.</p> \n<p>Dutton said the 7 October attack by Hamas had sparked a new wave of antisemitism.</p> \n<p>“This prime minister had a solemn duty to stand up and make sure that his government spoke with one voice,” Dutton claimed, accusing the government of “speaking out of both sides of its mouth”.</p> \n<p>“The caucus is split right down the middle. The Australian public sees this as a government where the wheels are quickly falling off. It’s given rise to social disharmony in this country.”</p> \n<p>Dutton went on to claim that “what compounds it” was what he called a “decision” from the immigration minister, Andrew Giles, to release into the community more than 80 people previously in detention after the high court decision.</p> \n<p>In response to opposition questions, Giles said “complying with the orders of the high court is not something that is optional”.</p> \n<p>In a furious response to Dutton, Albanese accused the opposition of “overreach”.</p> \n<p>“The weaponisation, or attempt to weaponise antisemitism, in this chamber and make it a partisan issue is frankly beyond contempt,” he said.</p> \n<p>“I make no apologies for standing up against antisemitism and I will do it unequivocally … But I also have a track record of standing up for the rights and for justice of Palestinian people.”</p>"



[[1]]$elements[[2]]
[[1]]$elements[[2]]$type
[1] "rich-link"

[[1]]$elements[[2]]$assets
list()

[[1]]$elements[[2]]$richLinkTypeData
[[1]]$elements[[2]]$richLinkTypeData$url
[1] "https://www.theguardian.com/australia-news/2023/nov/08/australia-high-court-indefinite-detention-ruling-government"

[[1]]$elements[[2]]$richLinkTypeData$originalUrl
[1] "https://www.theguardian.com/australia-news/2023/nov/08/australia-high-court-indefinite-detention-ruling-government"

[[1]]$elements[[2]]$richLinkTypeData$linkText
[1] "Indefinite immigration detention ruled unlawful in landmark Australian high court decision "

[[1]]$elements[[2]]$richLinkTypeData$linkPrefix
[1] "Related: "

[[1]]$elements[[2]]$richLinkTypeData$role
[1] "thumbnail"



[[1]]$elements[[3]]
[[1]]$elements[[3]]$type
[1] "text"

[[1]]$elements[[3]]$assets
list()

[[1]]$elements[[3]]$textTypeData
[[1]]$elements[[3]]$textTypeData$html
[1] "<p>The prime minister said the government needed to support both Jewish and Muslim communities and stand up for the rights of both Israelis and Palestinians.</p> \n<p>“Jewish Australians are fearful at the moment,” Albanese said. “The sort of activity that is occurring is scaring them and I stand with them. No one should threaten people because of their religion or their race in this country.</p> \n<p>“But it is also the case that Arab Australians and Islamic Australians and women wearing hijabs in the streets of Sydney and Melbourne are being threatened and I stand against that as well.”</p> \n<p>Albanese criticised the Coalition for its attempts under the Turnbull government to repeal or water down section 18C of the Racial Discrimination Act, despite concern from the Jewish community.</p> \n<p>Earlier on Wednesday Dutton brushed off a question about whether he regretted that move, given his interest in social cohesion. “I’ll let you live in the past,” Dutton responsed.</p> \n<p>Independent MPs condemned the opposition’s question time tactics. The member for Goldstein, Zoe Daniel, claimed it was “promoting social division not social cohesion”; the member for Wentworth, Allegra Spender, claimed the “attempt to weaponise and politicise antisemitism for political gain is unconscionable”.</p> \n<p>After question time, the shadow immigration minister, Dan Tehan, sought to portray the government’s decision to bring urgent legislation as at odds with earlier statements it was not possible to completely counteract the high court’s decision.</p> \n<p>The government bill will advance policies that do not re-detain people, likely to include greater use of conditions on visas, compliance and monitoring.</p> \n<p>Tehan told reporters the Coalition would use its briefing to ask government lawyers “all the legal avenues we have to … lock these hardened criminals up again”. Not all of the people released have been convicted of crimes.</p> \n<p>Tehan said that he was confident a legal solution existed, although he was unable to identify what it might be.</p> \n<p>“We want to see these people locked up again,” he said.</p>"

Parsing the response: building a data wrangling function

It seems the API returns articles as HTML strings. Luckily, we know how to extract text from that 😎

library(rvest)
text <- pluck(res, "blocks", "body", 1, "bodyHtml") |> 
  read_html() |> 
  html_text2()
text
[1] "Anthony Albanese has accused Peter Dutton of “weaponising antisemitism” during a heated parliamentary debate, after the opposition leader attempted to link criticisms of the government’s response to the Gaza conflict and the release of detainees from immigration detention.\n\nThe prime minister was visibly angry during a fiery question time and responded to a Coalition motion, saying both Jewish and Muslim communities were scared and being threatened as the Israel-Hamas conflict continued. As the government faces criticism from its right and left factions over its responses to the Gaza war and the high court’s decision that indefinite immigration detention is unlawful, Albanese called for political leaders to strive for unity.\n\nSign up for Guardian Australia’s free morning and afternoon email newsletters for your daily news roundup\n\n“To come in here and move this resolution, and link antisemitism with the decision of the high court, is beyond contempt,” he said in response to Dutton’s motion.\n\n“I didn’t think that he could go this low as to link these two issues.”\n\nDespite including both in the motion, Dutton dismissed Albanese’s criticisms by saying “there is no link” between antisemitism and the high court case, accusing the prime minister of “concocted outrage”.\n\nDutton had earlier demanded Albanese cancel his trip to the Apec meeting in San Francisco, to remain in Australia and oversee federal responses to the issues. Dutton told a press conference “the first charge of the prime minister is to keep the Australian people safe”, asking for a national cabinet meeting of state leaders to discuss community safety.\n\nHome affairs minister, Clare O’Neil, and immigration minister Andrew Giles said in a statement on Wednesday night that the high court case raised “complex issues” around community safety, and that the government would “introduce and seek to pass this legislation tomorrow to further respond to the High Court’s decision”.\n\n“The full implications will not be clear until the high court has provided written reasons for their judgement, which means further legislation may be required once that judgement has been considered.”\n\nThe opposition leader claimed Labor had a “a divided caucus in relation to the Israel issue”. Guardian Australia has reported Labor MPs and party members, including in Albanese’s own local branch, are agitating for the government to publicly call for a Gaza ceasefire.\n\nDutton brought the criticism to question time on Wednesday, moving a motion calling on the House of Representatives to express its concern at rising antisemitism and the release of 80 people from detention.\n\nDutton said the 7 October attack by Hamas had sparked a new wave of antisemitism.\n\n“This prime minister had a solemn duty to stand up and make sure that his government spoke with one voice,” Dutton claimed, accusing the government of “speaking out of both sides of its mouth”.\n\n“The caucus is split right down the middle. The Australian public sees this as a government where the wheels are quickly falling off. It’s given rise to social disharmony in this country.”\n\nDutton went on to claim that “what compounds it” was what he called a “decision” from the immigration minister, Andrew Giles, to release into the community more than 80 people previously in detention after the high court decision.\n\nIn response to opposition questions, Giles said “complying with the orders of the high court is not something that is optional”.\n\nIn a furious response to Dutton, Albanese accused the opposition of “overreach”.\n\n“The weaponisation, or attempt to weaponise antisemitism, in this chamber and make it a partisan issue is frankly beyond contempt,” he said.\n\n“I make no apologies for standing up against antisemitism and I will do it unequivocally … But I also have a track record of standing up for the rights and for justice of Palestinian people.”\n\nRelated: Indefinite immigration detention ruled unlawful in landmark Australian high court decision\n\nThe prime minister said the government needed to support both Jewish and Muslim communities and stand up for the rights of both Israelis and Palestinians.\n\n“Jewish Australians are fearful at the moment,” Albanese said. “The sort of activity that is occurring is scaring them and I stand with them. No one should threaten people because of their religion or their race in this country.\n\n“But it is also the case that Arab Australians and Islamic Australians and women wearing hijabs in the streets of Sydney and Melbourne are being threatened and I stand against that as well.”\n\nAlbanese criticised the Coalition for its attempts under the Turnbull government to repeal or water down section 18C of the Racial Discrimination Act, despite concern from the Jewish community.\n\nEarlier on Wednesday Dutton brushed off a question about whether he regretted that move, given his interest in social cohesion. “I’ll let you live in the past,” Dutton responsed.\n\nIndependent MPs condemned the opposition’s question time tactics. The member for Goldstein, Zoe Daniel, claimed it was “promoting social division not social cohesion”; the member for Wentworth, Allegra Spender, claimed the “attempt to weaponise and politicise antisemitism for political gain is unconscionable”.\n\nAfter question time, the shadow immigration minister, Dan Tehan, sought to portray the government’s decision to bring urgent legislation as at odds with earlier statements it was not possible to completely counteract the high court’s decision.\n\nThe government bill will advance policies that do not re-detain people, likely to include greater use of conditions on visas, compliance and monitoring.\n\nTehan told reporters the Coalition would use its briefing to ask government lawyers “all the legal avenues we have to … lock these hardened criminals up again”. Not all of the people released have been convicted of crimes.\n\nTehan said that he was confident a legal solution existed, although he was unable to identify what it might be.\n\n“We want to see these people locked up again,” he said."

Parsing the response: finising the data wrangling function

Let’s put this all together:

parse_response <- function(res) {
  
  text <- pluck(res, "blocks", "body", 1, "bodyHtml") |> 
    read_html() |> 
    html_text2()
  
  tibble(
    id = res$id,
    type = res$type,
    time = lubridate::ymd_hms(res$webPublicationDate),
    headline = res$webTitle,
    text = text
  )
}
parse_response(res)
# A tibble: 1 × 5
  id                                    type  time                headline text 
  <chr>                                 <chr> <dttm>              <chr>    <chr>
1 australia-news/2023/nov/15/peter-dut… arti… 2023-11-15 07:19:09 Peter D… "Ant…

We can loop over all articles returned by the API and apply this function to it:

map(search_res, parse_response) |> 
  bind_rows() # combine the list into one data.frame
# A tibble: 10 × 5
   id                                   type  time                headline text 
   <chr>                                <chr> <dttm>              <chr>    <chr>
 1 australia-news/2023/nov/15/peter-du… arti… 2023-11-15 07:19:09 Peter D… "Ant…
 2 world/article/2024/may/28/georgian-… arti… 2024-05-28 18:27:51 Georgia… "Geo…
 3 world/article/2024/jul/02/german-pa… arti… 2024-07-02 14:41:25 German … "Ger…
 4 world/article/2024/jun/07/who-are-t… arti… 2024-06-07 15:08:19 Who are… "Mos…
 5 politics/article/2024/jun/22/brexit… arti… 2024-06-22 15:52:52 Reopeni… "Kei…
 6 uk-news/2024/mar/22/jersey-debate-a… arti… 2024-03-22 14:45:27 Jersey … "Jer…
 7 politics/article/2024/jun/30/farage… arti… 2024-06-30 07:00:15 Farage … "The…
 8 australia-news/2024/mar/25/labor-al… arti… 2024-03-25 08:08:13 Labor a… "The…
 9 politics/article/2024/jun/26/sunak-… arti… 2024-06-26 21:30:57 Sunak a… "Ris…
10 world/2024/apr/11/poland-mps-debate… arti… 2024-04-11 13:58:50 Polish … "Pol…

Exercises 1

First, review the material and make sure you have a broad understanding how to:

  • build a request to an API
  • perform the request
  • handle the response
  1. httr2 has several more functions to customize how a request is performed. What do these functions do?
  • req_throttle:
  • req_error:
  • req_retry:
  1. Make your own request to the API with a different search term

  2. You might want to add more information to the data.frame. Adapt the function parse_response to also extract: apiUrl, lastModifiedDate, pillarId

  3. Request page 2 from the API. You can search for the correct query parameter in the documentation here: https://open-platform.theguardian.com/documentation/

Example: The UK Parliament API

Background

  • The UK parliament offers several APIs
  • You can get data on members, constituencies votes etc.
  • The documentation is generated from OpenAPI specifications and rendered with swagger, which is quite convenient

Exploring the Docs

We can look for an endpoint that interests us and even run an example right here!

/api/Members/Search endpoint

We even get a Curl call, which makes this really convenient!

Note: what are cURL calls

  • cURL is a library that can make HTTP requests
  • think of it as a general non-R-specific httr2
  • it is widely used for API calls from the terminal.
  • it lists the parameters of a call in a pretty readable manner:
    • the unnamed argument in the beginning is the Uniform Resource Locator (URL) the request goes to
    • -H arguments describe the headers, which are arguments sent with the call
    • -d is the data or body of a request, which is used e.g., for uploading things
    • --compressed means to ask for a compressed response which is unpacked locally (saves bandwidth)
curl 'https://www.researchgate.net/profile/Johannes-Gruber-2' \
  -H 'accept-language: en-GB,en;q=0.9' \
  -H 'cache-control: max-age=0' \
  -H 'Cookie: [Redacted]' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36' \
  --compressed

A more advanced curl call

Translating the example request

What’s great about curl calls is that httr2 has a way to translate them into R code:

curl_translate("curl -X 'GET' \
  'https://members-api.parliament.uk/api/Members/Search?Name=Major&skip=0&take=20' \
  -H 'accept: text/plain'")
request("https://members-api.parliament.uk/api/Members/Search") |> 
  req_method("GET") |> 
  req_url_query(
    Name = "Major",
    skip = "0",
    take = "20",
  ) |> 
  req_headers(
    accept = "text/plain",
  ) |> 
  req_perform()

Some pointers:

  • make sure to escape " when translating curl calls. You can use the search and replace tool in RStudio and turn " inside the curl string into \"
  • when you call just curl_translate(), it uses what is currently in your clipboard, parses it, and copies the result back to your clipboard

Making the first request from R

We can copy the output from curl_translate() and run it in R. I also added the resp_body_json() since we already know the returned data will be json.

search <- request("https://members-api.parliament.uk/api/Members/Search?Name=Major&skip=0&take=20") |>
  req_method("GET") |>
  req_headers(
    accept = "text/plain",
  ) |>
  req_perform() |>
  resp_body_json()
pluck(search, "totalResults")
[1] 1
pluck(search, "items", 1) |> 
  lobstr::tree(max_length = 25)
<list>
├─value: <list>
│ ├─id: 119
│ ├─nameListAs: "Major, Mr John"
│ ├─nameDisplayAs: "Mr John Major"
│ ├─nameFullTitle: "Rt Hon John Major"
│ ├─nameAddressAs: "Mr Major"
│ ├─latestParty: <list>
│ │ ├─id: 4
│ │ ├─name: "Conservative"
│ │ ├─abbreviation: "Con"
│ │ ├─backgroundColour: "0063ba"
│ │ ├─foregroundColour: "ffffff"
│ │ ├─isLordsMainParty: TRUE
│ │ ├─isLordsSpiritualParty: FALSE
│ │ ├─governmentType: 3
│ │ └─isIndependentParty: FALSE
│ ├─gender: "M"
│ ├─latestHouseMembership: <list>
│ │ ├─membershipFrom: "Huntingdon"
│ │ ├─membershipFromId: 1530
│ │ ├─house: 1
│ │ ├─membershipStartDate: "1979-05-03T00:00:00"
│ │ ├─membershipEndDate: "2001-06-07T00:00:00"
│ │ ├─membershipEndReason: "Dissolution"
... 

Wrangling the data

As usual, we get some meta information like totalResults and the data in a list. To make the items more useful, we can bring them into a tabular format.

items <- pluck(search, "items")
tibble(
  id                    = map_int(items, function(i) pluck(i, "value", "id")),
  nameListAs            = map_chr(items, function(i) pluck(i, "value", "nameListAs")),
  nameDisplayAs         = map_chr(items, function(i) pluck(i, "value", "nameDisplayAs")),
  nameFullTitle         = map_chr(items, function(i) pluck(i, "value", "nameFullTitle")),
  nameAddressAs         = map_chr(items, function(i) pluck(i, "value", "nameAddressAs")),
  gender                = map_chr(items, function(i) pluck(i, "value", "gender")),
  latestParty           = map(items, function(i) pluck(i, "value", "latestParty")),
  latestHouseMembership = map(items, function(i) pluck(i, "value", "latestHouseMembership")),
  test                  = map_chr(items, function(i) pluck(i, "value", "test", .default = NA))
)
# A tibble: 1 × 9
     id nameListAs nameDisplayAs nameFullTitle nameAddressAs gender latestParty 
  <int> <chr>      <chr>         <chr>         <chr>         <chr>  <list>      
1   119 Major, Mr… Mr John Major Rt Hon John … Mr Major      M      <named list>
# ℹ 2 more variables: latestHouseMembership <list>, test <chr>

This code is relatively busy, so let’s deconstruct it a little:

  • tibble wraps the results in a tibble
  • items is a list, to extract the first element from it, we used pluck(search, "items", 1), but usually we have more than 1 result, so we need to loop over the results using a map_* function
  • We know what types to expect from our first request, so we choose map_int for integer fields, map_chr for character fields and map for lists
  • we included the test column simply to show why we use pluck here instead of e.g., i[["value"]][["id"]]: we can set a default value if nothing is found
    • many APIs are inconsistent in what they return
    • if you try to extract a field deep in a list with [[]], you will get an error that the field does not exist or NULL (which causes an error with tibble())
    • returning NA instead makes the parsing safer and is good practice

Wrapping the endpoint in a function

The reason why APIs are useful is because you can request all kinds of information using a few parameters. This lends itself very well to wrapping specific calls in functions.

# make a new function with different default
safe_pluck <- function(...) {
  pluck(..., .default = NA)
}

search_members <- function(name) {
  
  # request
  resp <- request("https://members-api.parliament.uk/api/Members/Search") |>
    req_method("GET") |>
    req_url_query(
      Name = name
    ) |> 
    req_headers(
      accept = "text/plain",
    ) |>
    req_perform() |> 
    resp_body_json()
  
  # wrangle
  items <- pluck(resp, "items")
  return(tibble(
    id                    = map_int(items, function(i) safe_pluck(i, "value", "id")),
    nameListAs            = map_chr(items, function(i) safe_pluck(i, "value", "nameListAs")),
    nameDisplayAs         = map_chr(items, function(i) safe_pluck(i, "value", "nameDisplayAs")),
    nameFullTitle         = map_chr(items, function(i) safe_pluck(i, "value", "nameFullTitle")),
    nameAddressAs         = map_chr(items, function(i) safe_pluck(i, "value", "nameAddressAs")),
    gender                = map_chr(items, function(i) safe_pluck(i, "value", "gender")),
    latestParty           = map(items, function(i) safe_pluck(i, "value", "latestParty")),
    latestHouseMembership = map(items, function(i) safe_pluck(i, "value", "latestHouseMembership"))
  ))
  
}
search_members("Blair")
# A tibble: 4 × 8
     id nameListAs nameDisplayAs nameFullTitle nameAddressAs gender latestParty 
  <int> <chr>      <chr>         <chr>         <chr>         <chr>  <list>      
1   512 Blair, Mr… Mr Tony Blair Rt Hon Tony … Mr Blair      M      <named list>
2  4182 Blair of … Lord Blair o… The Lord Bla… The Lord Bla… M      <named list>
3  4377 Donaldson… Stuart Blair… Stuart Blair… Stuart Blair… M      <named list>
4  5076 McDougall… Blair McDoug… Blair McDoug… <NA>          M      <named list>
# ℹ 1 more variable: latestHouseMembership <list>
search_members("Smith")
# A tibble: 20 × 8
      id nameListAs             nameDisplayAs nameFullTitle nameAddressAs gender
   <int> <chr>                  <chr>         <chr>         <chr>         <chr> 
 1   727 Buchanan-Smith, Alick  Alick Buchan… Rt Hon Alick… <NA>          M     
 2  4756 Clarke-Smith, Brendan  Brendan Clar… Brendan Clar… <NA>          M     
 3  2723 Delacourt-Smith of Al… Baroness Del… The Baroness… <NA>          F     
 4  2713 Dixon-Smith, L.        Lord Dixon-S… The Lord Dix… The Lord Dix… M     
 5   152 Duncan Smith, Sir Iain Sir Iain Dun… Rt Hon Sir I… Sir Iain Dun… M     
 6  2490 Goldsmith, L.          Lord Goldsmi… The Rt Hon. … <NA>          M     
 7  4062 Goldsmith of Richmond… Lord Goldsmi… The Right Ho… <NA>          M     
 8    29 Johnson Smith, Sir Ge… Sir Geoffrey… Sir Geoffrey… <NA>          M     
 9  5341 Kyrke-Smith, Laura     Laura Kyrke-… Laura Kyrke-… <NA>          F     
10  4554 McGregor-Smith, B.     Baroness McG… The Baroness… <NA>          F     
11  5273 Naismith, Connor       Connor Naism… Connor Naism… <NA>          M     
12   216 Naysmith, Dr Doug      Dr Doug Nays… Dr Doug Nays… Dr Naysmith   M     
13  4738 Smith, Alyn            Alyn Smith    Alyn Smith    <NA>          M     
14    95 Smith, Mr Andrew       Mr Andrew Sm… Rt Hon Andre… Mr Smith      M     
15  1564 Smith, Angela          Angela Smith  Angela Smith  Angela Smith  F     
16    30 Smith, Angela E.       Angela E. Sm… Rt Hon Angel… <NA>          F     
17  4436 Smith, Cat             Cat Smith     Cat Smith MP  Cat Smith     F     
18  1609 Smith, Chloe           Chloe Smith   Rt Hon Chloe… Chloe Smith   F     
19  1292 Smith, Sir Cyril       Sir Cyril Sm… Sir Cyril Sm… <NA>          M     
20  5218 Smith, David           David Smith   David Smith … <NA>          M     
# ℹ 2 more variables: latestParty <list>, latestHouseMembership <list>

The Smith search is a little odd since there are surely more than 20 results for this common name.

Wrapping the endpoint in a function: add pagination

  • Most APIs use pagination when the data matching a query becomes too big
  • In that case you need to iterate through the pages to get everything
  • The UK parliament APIs handles pagination through two parameters:
    • skip: The number of records to skip from the first, default is 0
    • take: The number of records to return, default is 20. Maximum is 20

So to get the second page with the next 20 results, we need to adapt the call:

resp <- request("https://members-api.parliament.uk/api/Members/Search") |>
  req_method("GET") |>
  req_url_query(
    Name = "Smith",
    take = 20,
    skip = 20
  ) |> 
  req_headers(
    accept = "text/plain",
  ) |>
  req_perform() |> 
  resp_body_json()

# wrangle
items <- pluck(resp, "items")
tibble(
  id                    = map_int(items, function(i) safe_pluck(i, "value", "id")),
  nameListAs            = map_chr(items, function(i) safe_pluck(i, "value", "nameListAs")),
  nameDisplayAs         = map_chr(items, function(i) safe_pluck(i, "value", "nameDisplayAs")),
  nameFullTitle         = map_chr(items, function(i) safe_pluck(i, "value", "nameFullTitle")),
  nameAddressAs         = map_chr(items, function(i) safe_pluck(i, "value", "nameAddressAs")),
  gender                = map_chr(items, function(i) safe_pluck(i, "value", "gender")),
  latestParty           = map(items, function(i) safe_pluck(i, "value", "latestParty")),
  latestHouseMembership = map(items, function(i) safe_pluck(i, "value", "latestHouseMembership"))
)
# A tibble: 20 × 8
      id nameListAs            nameDisplayAs  nameFullTitle nameAddressAs gender
   <int> <chr>                 <chr>          <chr>         <chr>         <chr> 
 1  1267 Smith, Sir Dudley     Sir Dudley Sm… Sir Dudley S… <NA>          M     
 2  4609 Smith, Eleanor        Eleanor Smith  Eleanor Smith Eleanor Smith F     
 3   471 Smith, Geraldine      Geraldine Smi… Geraldine Sm… <NA>          F     
 4  4778 Smith, Greg           Greg Smith     Greg Smith MP <NA>          M     
 5  3960 Smith, Henry          Henry Smith    Henry Smith   Henry Smith   M     
 6  4456 Smith, Jeff           Jeff Smith     Jeff Smith MP Jeff Smith    M     
 7   681 Smith, John           John Smith     John Smith    John Smith    M     
 8   564 Smith, Mr John        Mr John Smith  Rt Hon John … <NA>          M     
 9  4118 Smith, Sir Julian     Sir Julian Sm… Rt Hon Sir J… <NA>          M     
10  2852 Smith, L.             The Lord Smith The Rt Hon. … <NA>          M     
11  4648 Smith, Laura          Laura Smith    Laura Smith   Laura Smith   F     
12   541 Smith, Llew           Llew Smith     Llew Smith    <NA>          M     
13  3928 Smith, Nick           Nick Smith     Nick Smith MP Nick Smith    M     
14  4042 Smith, Owen           Owen Smith     Owen Smith    Owen Smith    M     
15  5301 Smith, Rebecca        Rebecca Smith  Rebecca Smit… <NA>          F     
16   639 Smith, Sir Robert     Sir Robert Sm… Sir Robert S… Sir Robert    M     
17  4478 Smith, Royston        Royston Smith  Royston Smith Royston Smith M     
18  5117 Smith, Sarah          Sarah Smith    Sarah Smith … <NA>          F     
19  1245 Smith, Timothy        Timothy Smith  Timothy Smith <NA>          M     
20  4170 Smith of Basildon, B. Baroness Smit… The Rt Hon. … The Rt Hon. … F     
# ℹ 2 more variables: latestParty <list>, latestHouseMembership <list>

Wrapping the endpoint in a function: add pagination

Based on this we can adapt the function

search_members <- function(name) {
  
  # make the initial request
  resp <- request("https://members-api.parliament.uk/api/Members/Search") |>
    req_method("GET") |>
    req_url_query(
      Name = name,
      take = 20,
      skip = 0
    ) |> 
    req_headers(
      accept = "text/plain",
    ) |>
    req_perform() |> 
    resp_body_json()
  
  # checking the total and setting things up for pagination
  total <- resp$totalResults
  message(total, " results found")
  skip <- 0
  page <- 1
  
  # extract initial results
  items <- pluck(resp, "items")
  
  # while loops are repeated until the condition inside is FALSE
  while (total > skip) { 
    
    skip <- skip + 20
    page <- page + 1
    
    # we print a little status message to let the user know work is ongoing
    message("\t...fetching page ", page)
    
    # we retrieve the next page by adding an increasing skip
    resp <- request("https://members-api.parliament.uk/api/Members/Search") |>
      req_method("GET") |>
      req_url_query(
        Name = name,
        skip = skip,
        take = 20
      ) |> 
      req_headers(
        accept = "text/plain",
      ) |>
      req_throttle(rate = 1) |> # do not make more than one request per second
      req_perform() |> 
      resp_body_json()
    
    # we append the original result with the new items
    items <- c(items, pluck(resp, "items"))
    
  }
  
  # wrangle
  return(tibble(
    id                    = map_int(items, function(i) safe_pluck(i, "value", "id")),
    nameListAs            = map_chr(items, function(i) safe_pluck(i, "value", "nameListAs")),
    nameDisplayAs         = map_chr(items, function(i) safe_pluck(i, "value", "nameDisplayAs")),
    nameFullTitle         = map_chr(items, function(i) safe_pluck(i, "value", "nameFullTitle")),
    nameAddressAs         = map_chr(items, function(i) safe_pluck(i, "value", "nameAddressAs")),
    gender                = map_chr(items, function(i) safe_pluck(i, "value", "gender")),
    latestParty           = map(items, function(i) safe_pluck(i, "value", "latestParty")),
    latestHouseMembership = map(items, function(i) safe_pluck(i, "value", "latestHouseMembership"))
  ))
  
}
search_members("Smith")
# A tibble: 50 × 8
      id nameListAs             nameDisplayAs nameFullTitle nameAddressAs gender
   <int> <chr>                  <chr>         <chr>         <chr>         <chr> 
 1   727 Buchanan-Smith, Alick  Alick Buchan… Rt Hon Alick… <NA>          M     
 2  4756 Clarke-Smith, Brendan  Brendan Clar… Brendan Clar… <NA>          M     
 3  2723 Delacourt-Smith of Al… Baroness Del… The Baroness… <NA>          F     
 4  2713 Dixon-Smith, L.        Lord Dixon-S… The Lord Dix… The Lord Dix… M     
 5   152 Duncan Smith, Sir Iain Sir Iain Dun… Rt Hon Sir I… Sir Iain Dun… M     
 6  2490 Goldsmith, L.          Lord Goldsmi… The Rt Hon. … <NA>          M     
 7  4062 Goldsmith of Richmond… Lord Goldsmi… The Right Ho… <NA>          M     
 8    29 Johnson Smith, Sir Ge… Sir Geoffrey… Sir Geoffrey… <NA>          M     
 9  5341 Kyrke-Smith, Laura     Laura Kyrke-… Laura Kyrke-… <NA>          F     
10  4554 McGregor-Smith, B.     Baroness McG… The Baroness… <NA>          F     
# ℹ 40 more rows
# ℹ 2 more variables: latestParty <list>, latestHouseMembership <list>

Adding more parameters

  • The documentation lists a whole lot of other parameters.
  • We can copy them into the function to employ them when calling the API.
  • We can set the defaults to NULL, which means they are ignored by req_url_query when not used
  • Documentations usually list the required parameters, for which you shouldn’t set a default
search_members <- function(name = NULL,
                           location = NULL,
                           posttitle = NULL,
                           partyid = NULL,
                           house = NULL,
                           constituencyid = NULL,
                           namestartswith = NULL,
                           gender = NULL,
                           membershipstartedsince = NULL,
                           membershipended_membershipendedsince = NULL,
                           membershipended_membershipendreasonids = NULL,
                           membershipindaterange_wasmemberonorafter = NULL,
                           membershipindaterange_wasmemberonorbefore = NULL,
                           membershipindaterange_wasmemberofhouse = NULL,
                           iseligible = NULL,
                           iscurrentmember = NULL,
                           policyinterestid = NULL,
                           experience = NULL) {
  
  # 1. request
  resp <- request("https://members-api.parliament.uk/api/Members/Search") |>
    req_method("GET") |>
    req_url_query(
      Name = name,
      Location = location,
      PostTitle = posttitle,
      PartyId = partyid,
      House = house,
      ConstituencyId = constituencyid,
      NameStartsWith = namestartswith,
      Gender = gender,
      MembershipStartedSince = membershipstartedsince,
      MembershipEnded.MembershipEndedSince = membershipended_membershipendedsince,
      MembershipEnded.MembershipEndReasonIds = membershipended_membershipendreasonids,
      MembershipInDateRange.WasMemberOnOrAfter = membershipindaterange_wasmemberonorafter,
      MembershipInDateRange.WasMemberOnOrBefore = membershipindaterange_wasmemberonorbefore,
      MembershipInDateRange.WasMemberOfHouse = membershipindaterange_wasmemberofhouse,
      IsEligible = iseligible,
      IsCurrentMember = iscurrentmember,
      PolicyInterestId = policyinterestid,
      Experience = experience,
      take = 20
    ) |> 
    req_headers(
      accept = "text/plain",
    ) |>
    req_perform() |> 
    # 2. Parse
    resp_body_json()
  
  # checking the total and setting things up for pagination
  total <- resp$totalResults
  message(total, " results found")
  skip <- 20
  page <- 1
  
  # extract initial results
  items <- pluck(resp, "items")
  
  # while loops are repeated until the condition inside is FALSE
  while (total > skip) { 
    page <- page + 1
    
    # we print a little status message to let the user know work is ongoing
    message("\t...fetching page ", page)
    
    # we retrieve the next page by adding an increasing skip
    resp <- request("https://members-api.parliament.uk/api/Members/Search") |>
      req_method("GET") |>
      req_url_query(
        Name = name,
        Location = location,
        PostTitle = posttitle,
        PartyId = partyid,
        House = house,
        ConstituencyId = constituencyid,
        NameStartsWith = namestartswith,
        Gender = gender,
        MembershipStartedSince = membershipstartedsince,
        MembershipEnded.MembershipEndedSince = membershipended_membershipendedsince,
        MembershipEnded.MembershipEndReasonIds = membershipended_membershipendreasonids,
        MembershipInDateRange.WasMemberOnOrAfter = membershipindaterange_wasmemberonorafter,
        MembershipInDateRange.WasMemberOnOrBefore = membershipindaterange_wasmemberonorbefore,
        MembershipInDateRange.WasMemberOfHouse = membershipindaterange_wasmemberofhouse,
        IsEligible = iseligible,
        IsCurrentMember = iscurrentmember,
        PolicyInterestId = policyinterestid,
        Experience = experience,
        take = 20,
        skip = skip
      ) |> 
      req_headers(
        accept = "text/plain",
      ) |>
      req_perform() |> 
      # 2. Parse
      resp_body_json()
    
    # we append the original result with the new items
    items <- c(items, pluck(resp, "items"))
    
    # increase the skip number
    skip <- skip + 20
  }
  
  # wrangle
  return(tibble(
    id                    = map_int(items, function(i) safe_pluck(i, "value", "id")),
    nameListAs            = map_chr(items, function(i) safe_pluck(i, "value", "nameListAs")),
    nameDisplayAs         = map_chr(items, function(i) safe_pluck(i, "value", "nameDisplayAs")),
    nameFullTitle         = map_chr(items, function(i) safe_pluck(i, "value", "nameFullTitle")),
    nameAddressAs         = map_chr(items, function(i) safe_pluck(i, "value", "nameAddressAs")),
    gender                = map_chr(items, function(i) safe_pluck(i, "value", "gender")),
    latestParty           = map(items, function(i) safe_pluck(i, "value", "latestParty")),
    latestHouseMembership = map(items, function(i) safe_pluck(i, "value", "latestHouseMembership"))
  ))
  
}
search_members("Smith", partyid = 4, house = 1, gender = "M", iscurrentmember = TRUE)
# A tibble: 3 × 8
     id nameListAs nameDisplayAs nameFullTitle nameAddressAs gender latestParty 
  <int> <chr>      <chr>         <chr>         <chr>         <chr>  <list>      
1   152 Duncan Sm… Sir Iain Dun… Rt Hon Sir I… Sir Iain Dun… M      <named list>
2  4778 Smith, Gr… Greg Smith    Greg Smith MP <NA>          M      <named list>
3  4118 Smith, Si… Sir Julian S… Rt Hon Sir J… <NA>          M      <named list>
# ℹ 1 more variable: latestHouseMembership <list>

Adding documentation

In its current form, the function is working well, but to find out what the parameters do, you would have to visit the documentation website, which isn’t great. To make this more useful, we should add some documentation. In R, roxygen2 package handles parsing documentation for package We can use it here to add explanations to the parameters. You can easily add roxygen code to a function using the Code menu in RStudio and Insert Roxygen Skeleton:

#' Search for members of the UK Parliament
#'
#' @param name Members where name contains term specified
#' @param location Members where postcode or geographical location matches the term specified
#' @param posttitle Members which have held the post specified
#' @param partyid Members which are currently affiliated with party with party ID
#' @param house Members where their most recent house is the house specified (1 for Commons, 2 for Lords)
#' @param constituencyid Members which currently hold the constituency with constituency id
#' @param namestartswith Members with surname beginning with letter(s) specified
#' @param gender Members with the gender specified
#' @param membershipstartedsince Members who started on or after the date given
#' @param membershipended_membershipendedsince Members who left the House on or after the date given
#' @param membershipended_membershipendreasonids 
#' @param membershipindaterange_wasmemberonorafter Members who were active on or after the date specified
#' @param membershipindaterange_wasmemberonorbefore Members who were active on or before the date specified
#' @param membershipindaterange_wasmemberofhouse Members who were active in the house specified (1 for Commons, 2 for Lords)
#' @param iseligible Members currently Eligible to sit in their House
#' @param iscurrentmember TRUE gives you members who are current
#' @param policyinterestid Members with specified policy interest
#' @param experience Members with specified experience
#'
#' @return
#' @export
#' 
#'
#' @examples
search_members <- function(name = NULL,
                           location = NULL,
                           posttitle = NULL,
                           partyid = NULL,
                           house = NULL,
                           constituencyid = NULL,
                           namestartswith = NULL,
                           gender = NULL,
                           membershipstartedsince = NULL,
                           membershipended_membershipendedsince = NULL,
                           membershipended_membershipendreasonids = NULL,
                           membershipindaterange_wasmemberonorafter = NULL,
                           membershipindaterange_wasmemberonorbefore = NULL,
                           membershipindaterange_wasmemberofhouse = NULL,
                           iseligible = NULL,
                           iscurrentmember = NULL,
                           policyinterestid = NULL,
                           experience = NULL) {
  
  # ...
  
}

Exercises 2

First, review the material and make sure you have a broad understanding how to:

  • read the documentation of the UK Parliament API (the documentation is specific to the API, but the Swagger format they use is very common)
  • how to translate a curl call
  • What the individual parts of the search_members function are doing

To get more information about an MP, we can use the endpoint “/api/Members/{id}/Biography”

  1. Search for an MP you are interested in with the function above and use the id on the documentation website with “Try it out”
  2. Copy the Curl call and translate it into httr2 code
  3. Wrangle the returned data into a tabular format

Bonus:

  1. Write a function which lets you request information given an ID and which wrangles the results
  2. Two more interesting endpoints are “/api/Posts/GovernmentPosts” and “/api/Posts/OppositionPosts”. What do they do and how can you request data from them

Example: Semantic Scholar

What do we want

  • Get information about scholars
  • We want to use Semantic Scholar
    • Semantic Scholar collects scientific papers and their authors
    • Semantic Scholar API supports Paper and Author Lookup

Exploring the documentation

  • The documentation for the API can be found here: https://api.semanticscholar.org/api-docs/graph
  • It is shown in the other common documentation format calledReDoc
  • I personally prefer swagger, however, this format can be produced by the OpenAPI specification linked on the website (you can use ReDoc though if you want)
  • There is a tool in R which opens a small server on your computer that can display OpenAPI specifications in the swagger format
library(swagger)
browseURL(swagger_index())

Making a first request

We can use one of the examples and convert it into httr2:

res <- request("https://api.semanticscholar.org/graph/v1/author/search?query=adam+smith") |> 
  req_perform() |> 
  resp_body_json()
View(res)

Parsing the initial request

We note two meta information that are helpful later on:

pluck(res, "total")
[1] 585
pluck(res, "next")
[1] 100

The actual data sits in data and is a pretty well behaved list that we can just convert to a tibble:

res_data <- pluck(res, "data") |> 
  bind_rows()
res_data
# A tibble: 100 × 2
   authorId   name            
   <chr>      <chr>           
 1 39765778   Adam D. Smith   
 2 2109352620 Adam M. Smith   
 3 2109352729 Adam B. Smith   
 4 2276184838 Adam B Smith    
 5 2118081662 A. Smith        
 6 39872837   Adam Smith      
 7 2109352685 Adam C. Smith   
 8 2128824945 Adam N. H. Smith
 9 2170968519 Adam W. Smith   
10 2109352898 Adam T. Smith   
# ℹ 90 more rows

However, the information seems a bit sparse… But we’ll look at that later.

Wrapping the endpoint in a function and add pagination

First we wrap this in a function and add pagination to get all results:

find_scholar <- function(name,
                         verbose = TRUE) {
  
  # make initial request
  res <- request("https://api.semanticscholar.org/graph/v1/author/search") |>
    req_url_query(query = name) |> 
    req_perform() |> 
    resp_body_json()
  
  # note total
  total <- pluck(res, "total")
  # display user message
  if (verbose) {
    message("Found ", total, " authors")
  }
  # note offset
  nxt <- pluck(res, "next")
  # wrangle initial data
  data <- pluck(res, "data") |> 
    bind_rows()
  page <- 1
  
  #----- New Stuff -----#
  
  # loop through pages until no new ones exist
  while (!is.null(nxt)) { # if there are not more results next is empty
    page <- page + 1
    message("\t...fetching page ", page)
    res <- request("https://api.semanticscholar.org/graph/v1/author/search") |>
      req_url_query(query = name,
                    offset = nxt) |> 
      req_throttle(rate = 30 / 60) |> # make only 30 requests per minute
      req_perform() |> 
      resp_body_json()
    
    # get next offset; will be NULL on the last page
    nxt <- pluck(res, "next")
    
    data_new <- pluck(res, "data") |> 
      bind_rows()
    data <- data |> 
      bind_rows(data_new)
  }
  
  return(data)
}
find_scholar("Adam Smith")
# A tibble: 585 × 2
   authorId   name            
   <chr>      <chr>           
 1 39765778   Adam D. Smith   
 2 2118081662 A. Smith        
 3 2109352648 Adam W. Smith   
 4 2216980146 A. Smith        
 5 39872837   Adam Smith      
 6 2109352685 Adam C. Smith   
 7 2128824945 Adam N. H. Smith
 8 2170968519 Adam W. Smith   
 9 2109352898 Adam T. Smith   
10 2109352821 Adam L. Smith   
# ℹ 575 more rows

So where is the rest of the data?

  • Semantic scholar only returns authorId and name by default.
  • But we also want papers.
  • The API handles this through the fields parameter and you can request additional fields
  • The given example is https://api.semanticscholar.org/graph/v1/author/search?query=adam+smith&fields=name,aliases,url,papers.title,papers.year

We are only interested in some of the fields, so let’s build a new request and see what we get:

resp <- request("https://api.semanticscholar.org/graph/v1/author/search") %>%
  req_url_query(query = "Adam Smith") %>%
  req_url_query(fields = "name,papers.title,papers.year,papers.fieldsOfStudy,papers.authors",
                limit = 10) |> 
  req_headers(accept = "application/json") |> 
  req_perform() |> 
  resp_body_json()
View(resp)

This structure is a lot more demanding since we have nested content (authors inside papers inside scholars).

wrangle the data

For most of the wrangling here, we can use the unnest_ functions from the tidyverse:

adam_search <- pluck(resp, "data") |>
  # bind initial data into a tibble
  bind_rows() |>
  # unnest papers list into columns
  unnest_wider(papers) |> 
  # unnest authors into rows
  unnest(authors) |> 
  # unnest the new authors into columns
  unnest_wider(authors, names_sep = "_") |> 
  # fieldsOfStudy is a list within a list, so we call unnest twice
  unnest(fieldsOfStudy, keep_empty = TRUE) |> 
  unnest(fieldsOfStudy, keep_empty = TRUE)

We now get several useful columns including the field of study of a paper (which we could use to differentiate between different authors with the same name).

adam_search
# A tibble: 2,355 × 8
   authorId name          paperId     title  year fieldsOfStudy authors_authorId
   <chr>    <chr>         <chr>       <chr> <int> <chr>         <chr>           
 1 39765778 Adam D. Smith 01670b5c78… The …  2023 <NA>          15089134        
 2 39765778 Adam D. Smith 01670b5c78… The …  2023 <NA>          39765778        
 3 39765778 Adam D. Smith 01670b5c78… The …  2023 <NA>          7430051         
 4 39765778 Adam D. Smith 01670b5c78… The …  2023 <NA>          4704115         
 5 39765778 Adam D. Smith 01670b5c78… The …  2023 <NA>          3429443         
 6 39765778 Adam D. Smith 01670b5c78… The …  2023 <NA>          32546788        
 7 39765778 Adam D. Smith 0671806ef9… Fort…  2023 <NA>          49608903        
 8 39765778 Adam D. Smith 0671806ef9… Fort…  2023 <NA>          39765778        
 9 39765778 Adam D. Smith 44fc62276b… US A…  2023 <NA>          2221731705      
10 39765778 Adam D. Smith 44fc62276b… US A…  2023 <NA>          17317553        
# ℹ 2,345 more rows
# ℹ 1 more variable: authors_name <chr>

Let’s put it all together in an extended function

find_scholar <- function(name, 
                         fields = "name,papers.title,papers.title,papers.year,papers.fieldsOfStudy,papers.authors",
                         limit = 100) {
  
  # make initial request
  res <- request("https://api.semanticscholar.org/graph/v1/author/search") %>%
    req_url_query(query = name) %>%
    req_url_query(fields = fields,
                  limit = limit) |> 
    req_headers(accept = "application/json") |> 
    req_perform() |> 
    resp_body_json()
  
  # note total
  total <- pluck(res, "total")
  # display user message
  message("Found ", total, " authors")
  # note offset
  nxt <- pluck(res, "next")
  
  # wrangle initial data
  data <- parse_response(res)
  page <- 1
  
  # loop through pages until no new ones exist
  while (!is.null(nxt)) {
    page <- page + 1
    message("\t...fetching page ", page)

    res <- request("https://api.semanticscholar.org/graph/v1/author/search") |>
      req_url_query(query = name,
                    offset = nxt,
                    fields = fields,
                    limit = limit) |> 
      req_throttle(rate = 30 / 60) |> # make only 30 requests per minute
      req_headers(accept = "application/json") |> 
      req_perform() |> 
      resp_body_json()
    
    # get next offset; will be NULL on the last page
    nxt <- pluck(res, "next")
    
    data_new <- pluck(res, "data") |> 
      bind_rows()
    data <- data |> 
      bind_rows(data_new)
  }
  
  return(data)
}

Let’s put it all together in an extended function

I separated the parsing function from this to make it easier to read.

parse_response <- function(resp) {
  pluck(resp, "data") |>
    # bind initial data into a tibble
    bind_rows() |>
    # unnest papers list into columns
    unnest_wider(papers) |> 
    # unnest authors into rows
    unnest(authors) |> 
    # unnest the new authors into columns
    unnest_wider(authors, names_sep = "_") |> 
    # fieldsOfStudy is a list within a list, so we call unnest twice
    unnest(fieldsOfStudy, keep_empty = TRUE) |> 
    unnest(fieldsOfStudy, keep_empty = TRUE)
}

Let’s test it with Ryan:

find_scholar("Ryan Bakker")
# A tibble: 506 × 8
   authorId  name        paperId      title  year fieldsOfStudy authors_authorId
   <chr>     <chr>       <chr>        <chr> <int> <chr>         <chr>           
 1 114790016 Ryan Bakker bd0caaea740… The …  2023 Medicine      118473903       
 2 114790016 Ryan Bakker bd0caaea740… The …  2023 Medicine      144919740       
 3 114790016 Ryan Bakker bd0caaea740… The …  2023 Medicine      2068978814      
 4 114790016 Ryan Bakker bd0caaea740… The …  2023 Medicine      114790016       
 5 114790016 Ryan Bakker 8f26b64b2a1… Cont…  2022 Medicine      101273729       
 6 114790016 Ryan Bakker 8f26b64b2a1… Cont…  2022 Medicine      114790016       
 7 114790016 Ryan Bakker 8f26b64b2a1… Cont…  2022 Medicine      50674874        
 8 114790016 Ryan Bakker 8f26b64b2a1… Cont…  2022 Medicine      118950061       
 9 114790016 Ryan Bakker 8f26b64b2a1… Cont…  2022 Medicine      49274136        
10 114790016 Ryan Bakker 8f26b64b2a1… Cont…  2022 Medicine      144779957       
# ℹ 496 more rows
# ℹ 1 more variable: authors_name <chr>

Exercises 3

First, review the material. This example is pretty similar to the last one. But:

  • it uses a different documentation style called ReDoc, which does not give you curl calls to copy
  • it uses a different pagination: instead of using the total number of items, we look for new ones until nothing new is returned
  • we throttle the number of requests
  1. Document the function we just created. This is mainly to let you think about the parameters and how you would describe their working to someone else
  2. Use the function to search for a couple of scholars of your choice. Who has the most co-authors and unique papers?
  3. Say you found an authors ID with the search function. How could you use “/author/{author_id}” and “/author/{author_id}/papers” to request more information about them?

Bonus:

  1. Write a function that wraps “/author/{author_id}”

Wrap Up

Save some information about the session for reproducibility.

sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: EndeavourOS

Matrix products: default
BLAS:   /usr/lib/libblas.so.3.12.0 
LAPACK: /usr/lib/liblapack.so.3.12.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/London
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rvest_1.0.4        httr2_1.0.1        lubridate_1.9.3    forcats_1.0.0     
 [5] stringr_1.5.1      dplyr_1.1.4        purrr_1.0.2        readr_2.1.5       
 [9] tidyr_1.3.1        tibble_3.2.1       ggplot2_3.5.1      tidyverse_2.0.0   
[13] tinytable_0.3.0.10

loaded via a namespace (and not attached):
 [1] rappdirs_0.3.3    utf8_1.2.4        generics_0.1.3    xml2_1.3.6       
 [5] stringi_1.8.4     hms_1.1.3         digest_0.6.35     magrittr_2.0.3   
 [9] evaluate_0.23     grid_4.4.1        timechange_0.3.0  fastmap_1.1.1    
[13] lobstr_1.1.2      jsonlite_1.8.8    processx_3.8.4    chromote_0.2.0   
[17] ps_1.7.7          promises_1.3.0    httr_1.4.7        fansi_1.0.6      
[21] scales_1.3.0      cli_3.6.3         rlang_1.1.4       crayon_1.5.2     
[25] docopt_0.7.1      munsell_0.5.1     withr_3.0.0       yaml_2.3.8       
[29] tools_4.4.1       tzdb_0.4.0        colorspace_2.1-0  curl_5.2.1       
[33] vctrs_0.6.5       R6_2.5.1          lifecycle_1.0.4   pkgconfig_2.0.3  
[37] pillar_1.9.0      later_1.3.2       gtable_0.3.5      Rcpp_1.0.12      
[41] glue_1.7.0        xfun_0.44         tidyselect_1.2.1  rstudioapi_0.16.0
[45] knitr_1.46        websocket_1.4.1   htmltools_0.5.8.1 rmarkdown_2.26   
[49] compiler_4.4.1